<<<<<<< HEAD ======= >>>>>>> 345b2aea9297a388f9a5a9dcbd8551052b84646e Chapter 5 - CCA: computer exercises ======= !function(t,e){"object"==typeof exports&&"object"==typeof module?module.exports=e():"function"==typeof define&&define.amd?define([],e):"object"==typeof exports?exports.ClipboardJS=e():t.ClipboardJS=e()}(this,function(){return n={686:function(t,e,n){"use strict";n.d(e,{default:function(){return o}});var e=n(279),i=n.n(e),e=n(370),u=n.n(e),e=n(817),c=n.n(e);function a(t){try{return document.execCommand(t)}catch(t){return}}var f=function(t){t=c()(t);return a("cut"),t};var l=function(t){var e,n,o,r=1 >>>>>>> 345b2aea9297a388f9a5a9dcbd8551052b84646e <<<<<<< HEAD ======= >>>>>>> 345b2aea9297a388f9a5a9dcbd8551052b84646e <<<<<<< HEAD
=======
>>>>>>> 345b2aea9297a388f9a5a9dcbd8551052b84646e

Chapter 5 - CCA: computer exercises

Author

Prof. Richard Wilkinson

<<<<<<< HEAD ======= >>>>>>> 345b2aea9297a388f9a5a9dcbd8551052b84646e
<<<<<<< HEAD

Task 1

Consider again the crabs dataset you looked at in the exercises in the chapter on PCA. We now consider a canonical correlation analysis in which one set of variables, the \(\mathbf x\)-set, is given by CL and CW and the other set, the \(\mathbf y\)-set, is given by FL, RW and BD.

library(MASS)
?crabs           # read the help page to find out about the dataset
X1 = crabs |> dplyr::select(CL, CW)  |>as.matrix()  
Y1 = crabs |> dplyr::select(FL, RW, BD) |> as.matrix()            
Sxx=cov(X1)
Syy=cov(Y1)       # find y-variable variance matrix
Sxy=cov(X1, Y1)       # find cross-covariance matrix
=======

SWAP |> for |>

RETURN TO CIRCLE PLOT - perhaps swap task order.

Task 1

Consider again the crabs dataset you looked at in the exercises in the chapter on PCA. We now consider a canonical correlation analysis in which one set of variables, the \(\mathbf x\)-set, is given by CL and CW and the other set, the \(\mathbf y\)-set, is given by FL, RW and BD.

library(MASS)
?crabs           # read the help page to find out about the dataset
X1 = crabs |> dplyr::select(CL, CW)  |>as.matrix()  
Y1 = crabs |> dplyr::select(FL, RW, BD) |> as.matrix()            
Sxx=cov(X1)
Syy=cov(Y1)       # find y-variable variance matrix
Sxy=cov(X1, Y1)       # find cross-covariance matrix
>>>>>>> 345b2aea9297a388f9a5a9dcbd8551052b84646e
  1. calculate \({\bf S}_{\bf x x}^{-1/2}\) and \({\bf S}_{\bf yy}^{-1/2}\) by first computing the spectral decomposition of \(\mathbf S_{xx}\) and \(\mathbf S_{yy}\).
<<<<<<< HEAD
=======
>>>>>>> 345b2aea9297a388f9a5a9dcbd8551052b84646e
  1. Now calculate the matrix \(\mathbf Q\) and compute its singular value decomposition.
<<<<<<< HEAD
=======
>>>>>>> 345b2aea9297a388f9a5a9dcbd8551052b84646e
  1. Compute the first pair of CC vectors and CC variables \(\eta_1\) and \(\psi_1\). What is the 1st canonical correlation?
<<<<<<< HEAD
=======
>>>>>>> 345b2aea9297a388f9a5a9dcbd8551052b84646e
  1. Plot \(\psi_1\) vs \(\eta_1\). What does the plot tell you (if anything)?
<<<<<<< HEAD
=======
>>>>>>> 345b2aea9297a388f9a5a9dcbd8551052b84646e
  1. Repeat the above to find the second pair of CC vectors, and the second set of CC variables/scores, and plot these against each other and against the first CC scores. Is there any interesting structure in any of the plots? Which plots suggest random scatter?
<<<<<<< HEAD
=======
>>>>>>> 345b2aea9297a388f9a5a9dcbd8551052b84646e
  1. Finally, repeat the analysis above using the cc command and plt.cc from the package CCA which you will need to download.
<<<<<<< HEAD
=======
>>>>>>> 345b2aea9297a388f9a5a9dcbd8551052b84646e <<<<<<< HEAD

Task 2

=======

Task 2

>>>>>>> 345b2aea9297a388f9a5a9dcbd8551052b84646e

The data for previous Premier League seasons is available at:

https://www.rotowire.com/soccer/league-table.php?season=2022

There is a button to download the csv (comma separated variable) file in the bottom right hand corner. Read the data into R (hint: try the read.csv command).

x <- read.csv(x , file="/YOURDIRECTORY/prem_league_data.txt, 
sep=" ", header=TRUE)
<<<<<<< HEAD

If you are not sure what the name of YOURDIRECTORY is where the file is located, then a useful command to find out is file.choose()

  1. Reproduce the analysis from the notes for the 2022-23 premier league season. Does the analysis produce a similar result as for the 2019-20 table?
=======

If you are not sure what the name of YOURDIRECTORY is where the file is located, then a useful command to find out is file.choose()

  1. Reproduce the analysis from the notes for the 2022-23 premier league season. Does the analysis produce a similar result as for the 2019-20 table?
>>>>>>> 345b2aea9297a388f9a5a9dcbd8551052b84646e
  1. Give an interpretation of the CC scores. One of doing this is to think about the correlation between the original variables and the scores (the transformed variables). Note that there are four different correlation matrices we can look at to aid interpretation: correlation between X and \(\eta\), \(X\) and \(\psi\), \(Y\) and \(\eta\), and \(Y\) and \(\psi\).

Circle plots can also help. Look at the help page for plt.cc and try some circle plots.

<<<<<<< HEAD
=======
>>>>>>> 345b2aea9297a388f9a5a9dcbd8551052b84646e

Task 3

We will now look data measured from 600 first year university students. Measurements were made on three psychological variables:

  • Locus of Control: the degree to someone believes that they, as opposed to external forces, have control over the outcome of events in their lives.
  • Self Concept: an indication of whether a person tends to hold a generally positive and consistent or negative and variable self-view.
  • Motivation: how motivated an individual is

which will form our \(\mathbf X\) variables. The \(\mathbf Y\) variables are four academic scores (standardized test scores)

  • Reading
  • Writing
  • Math
  • Science

and gender (1=Male, 0 = Female) We are interested in how the set of psychological variables relates to the academic variables and gender.

mm <- read.csv("https://stats.idre.ucla.edu/stat/data/mmreg.csv")
colnames(mm) <- c("Control", "Concept", "Motivation", "Read", "Write", "Math",
    "Science", "Sex")
psych <- mm[, 1:3]
acad <- mm[, 4:7]

Conduct CCA on these data. Provide an interpretation of your results.

=======

Q3

>>>>>>> 345b2aea9297a388f9a5a9dcbd8551052b84646e